Definite noun phrases in statistical machine translation into Scandinavian languages
نویسنده
چکیده
The Scandinavian languages have an unusual structure of definite noun phrases (NPs), with a noun suffix as one possibility of expressing definiteness, which is problematic for statistical machine translation from languages with different NP structures. We show that translation can be improved by simple source side transformations of definite NPs, for translation from English and Italian, into Danish, Swedish, and Norwegian, with small adjustments of the preprocessing strategy, depending on the language pair. We also explored target side transformations, with mixed results.
منابع مشابه
Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
In this thesis proposal I present my thesis work, about preand postprocessing for statistical machine translation, mainly into Germanic languages. I focus my work on four areas: compounding, definite noun phrases, reordering, and error correction. Initial results are positive within all four areas, and there are promising possibilities for extending these approaches. In addition I also focus on...
متن کاملDeterminers and Number in English contrasted with Japanese, as exemplified in Machine Translation
The fact that concepts are grammaticalized differently in different languages is a major problem for translation, especially for machine translation. Two major examples of this are syntactic number, and the use of (in)definite articles (a, some, the). In languages such as English, nouns are marked for number and the choice of article (or of no article) must be made for every noun phrase. In con...
متن کاملText Harmonization Strategies for Phrase-Based Statistical Machine Translation
In this thesis I aim to improve phrase-based statistical machine translation (PBSMT) in a number of ways by the use of text harmonization strategies. PBSMT systems are built by training statistical models on large corpora of human translations. This architecture generally performs well for languages with similar structure. If the languages are di erent for example with respect to word order or ...
متن کاملDefinite Noun Phrases in Statistical Machine Translation into Danish
There are two ways to express definiteness in Danish, which makes it problematic for statistical machine translation (SMT) from English, since the wrong realisation can be chosen. We present a part-of-speechbased method for identifying and transforming English definite NPs that would likely be expressed in a different way in Danish. The transformed English is used for training a phrase-based SM...
متن کاملReordering Constraint Based on Document-Level Context
One problem with phrase-based statistical machine translation is the problem of longdistance reordering when translating between languages with different word orders, such as Japanese-English. In this paper, we propose a method of imposing reordering constraints using document-level context. As the documentlevel context, we use noun phrases which significantly occur in context documents contain...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011